DataGarage: Warehousing Massive Performance Data on Commodity Servers
نویسندگان
چکیده
Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, we show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number of servers because of the scale and the complexity of performance data. We describe the design and implementation of DataGarage, a performance data warehousing system that we have developed at Microsoft. DataGarage is a hybrid solution that combines benefits of DBMSs, file-systems, and MapReduce systems to address unique challenges of warehousing performance data. We describe how DataGarage allows efficient storage and analysis of years of historical performance data collected from many tens of thousands of servers—on commodity servers. We also report DataGarage’s performance with a real dataset and a 32node, 256-core shared-nothing cluster and our experience of using DataGarage at Microsoft for the last one year.
منابع مشابه
DataGarage: Warehousing Massive Amounts of Performance Data on Commodity Servers
Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, we show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number of servers because of the scale and the comple...
متن کاملMist : Efficient Dissemination of Erasure-coded Data in Data Centers
Data centers store a massive amount of data in a large number of servers built by commodity hardware. To maintain data integrity against server failures, erasure codes have been extensively deployed in modern data centers to provide a higher level of failure tolerance with less storage overhead than replication. Yet, compared to replication, disseminating erasure-coded data from a source server...
متن کاملData warehousing with Oracle
With the emergence of data warehousing, Decision Support Systems have evolved to its best. At the core of these warehousing systems lies a good database management system. Database server, used for data warehousing, is responsible to provide robust data management, scalability, high performance query processing and integration with other servers. Oracle being the initiator in warehousing server...
متن کاملUnderstanding Dimension Volatility in Data Warehouses ( or Bin There Done That )
Introduction Data warehousing has become an increasingly important technology in many organizations, integrating disparate sources of data for decision-making, planning, and policy formulation. Data warehousing applications can be sources of competitive advantage. Even if the raw data is widely available, the strategies for integrating, analyzing, and acting on the information can be differenti...
متن کاملHadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 3 شماره
صفحات -
تاریخ انتشار 2010